formative feedback
Scaling Equitable Reflection Assessment in Education via Large Language Models and Role-Based Feedback Agents
Formative feedback is widely recognized as one of the most effective drivers of student learning, yet it remains difficult to implement equitably at scale. In large or low-resource courses, instructors often lack the time, staffing, and bandwidth required to review and respond to every student reflection, creating gaps in support precisely where learners would benefit most. This paper presents a theory-grounded system that uses five coordinated role-based LLM agents (Evaluator, Equity Monitor, Metacognitive Coach, Aggregator, and Reflexion Reviewer) to score learner reflections with a shared rubric and to generate short, bias-aware, learner-facing comments. The agents first produce structured rubric scores, then check for potentially biased or exclusionary language, add metacognitive prompts that invite students to think about their own thinking, and finally compose a concise feedback message of at most 120 words. The system includes simple fairness checks that compare scoring error across lower and higher scoring learners, enabling instructors to monitor and bound disparities in accuracy. We evaluate the pipeline in a 12-session AI literacy program with adult learners. In this setting, the system produces rubric scores that approach expert-level agreement, and trained graders rate the AI-generated comments as helpful, empathetic, and well aligned with instructional goals. Taken together, these results show that multi-agent LLM systems can deliver equitable, high-quality formative feedback at a scale and speed that would be impossible for human graders alone. More broadly, the work points toward a future where feedback-rich learning becomes feasible for any course size or context, advancing long-standing goals of equity, access, and instructional capacity in education.
Calibrated Generative AI as Meta-Reviewer: A Systemic Functional Linguistics Discourse Analysis of Reviews of Peer Reviews
Zapata, Gabriela C., Cope, Bill, Kalantzis, Mary, Searsmith, Duane
This study investigates the use of generative AI to support formative assessment through machine generated reviews of peer reviews in graduate online courses in a public university in the United States. Drawing on Systemic Functional Linguistics and Appraisal Theory, we analyzed 120 metareviews to explore how generative AI feedback constructs meaning across ideational, interpersonal, and textual dimensions. The findings suggest that generative AI can approximate key rhetorical and relational features of effective human feedback, offering directive clarity while also maintaining a supportive stance. The reviews analyzed demonstrated a balance of praise and constructive critique, alignment with rubric expectations, and structured staging that foregrounded student agency. By modeling these qualities, AI metafeedback has the potential to scaffold feedback literacy and enhance leaner engagement with peer review.
Feedback Indicators: The Alignment between Llama and a Teacher in Language Learning
Rรผdian, Sylvio, Elsir, Yassin, Kretschmer, Marvin, Cayrou, Sabine, Pinkwart, Niels
Automated feedback generation has the potential to enhance students' learning progress by providing timely and targeted feedback. Moreover, it can assist teachers in optimizing their time, allowing them to focus on more strategic and personalized aspects of teaching. To generate high-quality, information-rich formative feedback, it is essential first to extract relevant indicators, as these serve as the foundation upon which the feedback is constructed. Teachers often employ feedback criteria grids composed of various indicators that they evaluate systematically. This study examines the initial phase of extracting such indicators from students' submissions of a language learning course using the large language model Llama 3.1. Accordingly, the alignment between indicators generated by the LLM and human ratings across various feedback criteria is investigated. The findings demonstrate statistically significant strong correlations, even in cases involving unanticipated combinations of indicators and criteria. The methodology employed in this paper offers a promising foundation for extracting indicators from students' submissions using LLMs. Such indicators can potentially be utilized to auto-generate explainable and transparent formative feedback in future research.
Personalised Feedback Framework for Online Education Programmes Using Generative AI
Kuzminykh, Ievgeniia, Nawaz, Tareita, Shenzhang, Shihao, Ghita, Bogdan, Raphael, Jeffery, Xiao, Hannan
AI tools, particularly large language modules, have recently proven their effectiveness within learning management systems and online education programmes. As feedback continues to play a crucial role in learning and assessment in schools, educators must carefully customise the use of AI tools in order to optimally support students in their learning journey. Efforts to improve educational feedback systems have seen numerous attempts reflected in the research studies but mostly have been focusing on qualitatively benchmarking AI feedback against human-generated feedback. This paper presents an exploration of an alternative feedback framework which extends the capabilities of ChatGPT by integrating embeddings, enabling a more nuanced understanding of educational materials and facilitating topic-targeted feedback for quiz-based assessments. As part of the study, we proposed and developed a proof of concept solution, achieving an efficacy rate of 90% and 100% for open-ended and multiple-choice questions, respectively. The results showed that our framework not only surpasses expectations but also rivals human narratives, highlighting the potential of AI in revolutionising educational feedback mechanisms.
Automated Assessment and Adaptive Multimodal Formative Feedback Improves Psychomotor Skills Training Outcomes in Quadrotor Teleoperation
Jensen, Emily, Sankaranarayanan, Sriram, Hayes, Bradley
The workforce will need to continually upskill in order to meet the evolving demands of industry, especially working with robotic and autonomous systems. Current training methods are not scalable and do not adapt to the skills that learners already possess. In this work, we develop a system that automatically assesses learner skill in a quadrotor teleoperation task using temporal logic task specifications. This assessment is used to generate multimodal feedback based on the principles of effective formative feedback. Participants perceived the feedback positively. Those receiving formative feedback viewed the feedback as more actionable compared to receiving summary statistics. Participants in the multimodal feedback condition were more likely to achieve a safe landing and increased their safe landings more over the experiment compared to other feedback conditions. Finally, we identify themes to improve adaptive feedback and discuss and how training for complex psychomotor tasks can be integrated with learning theories.
How Well Can You Articulate that Idea? Insights from Automated Formative Assessment
Karizaki, Mahsa Sheikhi, Gnesdilow, Dana, Puntambekar, Sadhana, Passonneau, Rebecca J.
Automated methods are becoming increasingly integrated into studies of formative feedback on students' science explanation writing. Most of this work, however, addresses students' responses to short answer questions. We investigate automated feedback on students' science explanation essays, where students must articulate multiple ideas. Feedback is based on a rubric that identifies the main ideas students are prompted to include in explanatory essays about the physics of energy and mass, given their experiments with a simulated roller coaster. We have found that students generally improve on revised versions of their essays. Here, however, we focus on two factors that affect the accuracy of the automated feedback. First, we find that the main ideas in the rubric differ with respect to how much freedom they afford in explanations of the idea, thus explanation of a natural law is relatively constrained. Students have more freedom in how they explain complex relations they observe in their roller coasters, such as transfer of different forms of energy. Second, by tracing the automated decision process, we can diagnose when a student's statement lacks sufficient clarity for the automated tool to associate it more strongly with one of the main ideas above all others. This in turn provides an opportunity for teachers and peers to help students reflect on how to state their ideas more clearly.
"Close...but not as good as an educator." -- Using ChatGPT to provide formative feedback in large-class collaborative learning
Ponte, Cory Dal, Dushyanthen, Sathana, Lyons, Kayley
Delivering personalised, formative feedback to multiple problem-based learning groups in a short time period can be almost impossible. We employed ChatGPT to provide personalised formative feedback in a one-hour Zoom break-out room activity that taught practicing health professionals how to formulate evaluation plans for digital health initiatives. Learners completed an evaluation survey that included Likert scales and open-ended questions that were analysed. Half of the 44 survey respondents had never used ChatGPT before. Overall, respondents found the feedback favourable, described a wide range of group dynamics, and had adaptive responses to the feedback, yet only three groups used the feedback loop to improve their evaluation plans. Future educators can learn from our experience including engineering prompts, providing instructions on how to use ChatGPT, and scaffolding optimal group interactions with ChatGPT. Future researchers should explore the influence of ChatGPT on group dynamics and derive design principles for the use of ChatGPT in collaborative learning.
Augmenting Flight Training with AI to Efficiently Train Pilots
Guevarra, Michael, Das, Srijita, Wayllace, Christabel, Epp, Carrie Demmans, Taylor, Matthew E., Tay, Alan
We propose an AI-based pilot trainer to help students learn how to fly aircraft. First, an AI agent uses behavioral cloning to learn flying maneuvers from qualified flight instructors. Later, the system uses the agent's decisions to detect errors made by students and provide feedback to help students correct their errors. This paper presents an instantiation of the pilot trainer. We focus on teaching straight and level flying maneuvers by automatically providing formative feedback to the human student.
Developing Pedagogically-Guided Threshold Algorithms for Intelligent Automated Essay Feedback
Roscoe, Rod D. (Arizona State University) | Kugler, Danica (Arizona State University) | Crossley, Scott A. (Georgia State University) | Weston, Jennifer L. (Arizona State University) | McNamara, Danielle S. (Arizona State University)
Grimes and Warschauer (2010) describe two accuracy (Warschauer & Ware, 2006), there have been kinds of systems: automated essay scoring (AES) and relatively few evaluations of student improvement (e.g., automated writing evaluation (AWE). AES systems strive Kellogg, Whiteford, & Quinlan, 2010) or the role of to assign accurate and reliable scores to essays or specific feedback (e.g., Roscoe, Varner, Cai, Weston, Crossley, & writing features (e.g., mechanics). Scores are generated McNamara, 2011). Hence, in this paper, we explore and using various artificial intelligence (AI) methods, including describe a method for developing pedagogically-guided statistical modeling, natural language processing (NLP), algorithms that guide formative feedback in an intelligent and Latent Semantic Analysis (LSA) (Shermis & Burstein, tutor system (ITS) for writing.